66 research outputs found

    Improving Thermodynamic Models of Transcription by Combining ChIP and Expression Measurements of Synthetic Promoters

    Get PDF
    Regulation of gene expression is a fundamental process in biology. Accurate mathematical models of the relationship between regulatory sequence and observed expression would advance our understanding of biology. I developed ReLoS, a regulatory logic simulator, to explore mathematical frameworks for describing the relationship between regulatory sequence and observed expression and to explore methods of learning combinatorial regulatory rules from expression data. ReLoS is a flexible simulator allowing a variety of formalisms to be applied. ReLoS was used to explore the question of how complex rules of combinatorial transcriptional regulation must be to explain the complexity of transcriptional regulation observed in biology. A previously published dataset was analyzed for regulatory elements that explained the behavior of regulatory modules for 254 genes in 255 conditions. I found that ReLoS was able to recapitulate a reasonable fraction of the variation: mean gene-wise correlation of 0.7) with only twelve combinatorial rules comprising 13 cis-regulatory elements. This result suggested that learning the combinatorial rules of transcriptional regulation should be possible. State ensemble statistical thermodynamic models are a class of models used to describe combinatorial transcriptional regulation. One way to parameterize these models is measuring the expression of a reporter gene driven by many similar promoters . Models parameterized in this fashion do better at explaining the sequence to expression relationship, but fail to distinguish between multiple biological mechanisms that give rise to equivalent expression results in the synthetic promoters, thus limiting the generalizability of the models. I developed a ChIP-based strategy for quantitatively measuring the relative occupancy of transcription factors on synthetic promoters. This data complements existing methods for obtaining expression data from the same promoters. Comparison of models parameterized with only expression, only occupancy, or expression and occupancy reveals specific biological details that are missed when considering only expression data. In particular, the occupancy data suggests that differential regulatory effects of Cbf1 in glucose versus amino acid are a function of how it interacts with polymerase rather than changes in concentration or binding affinity. Additionally, the occupancy data suggests that Gcn4 binds in a cooperative manner and that Gcn4 occupancy is adversely affected by the presence of a nearby Nrg1 site. Finally, the occupancy data and expression data taken together suggest that Gcn4 binds in competition with another transcription factor. Synthesizing disparate sources of information resulted in an improved understanding of the mechanics of transcriptional regulation of the synthetic promoters and was ultimately largely successful in decoupling the DNA binding energies from the TF interactions with polymerase. However, it suggests that more sophisticated models of the relationship between occupancy and expression may be required in at least some cases. Incorporating different sources of data into models of regulation will continue to be important for learning the biological specifics that drive expression changes

    Discrimination between thermodynamic models of cis-regulation using transcription factor occupancy data

    Get PDF
    Many studies have identified binding preferences for transcription factors (TFs), but few have yielded predictive models of how combinations of transcription factor binding sites generate specific levels of gene expression. Synthetic promoters have emerged as powerful tools for generating quantitative data to parameterize models of combinatorial cis-regulation. We sought to improve the accuracy of such models by quantifying the occupancy of TFs on synthetic promoters in vivo and incorporating these data into statistical thermodynamic models of cis-regulation. Using chromatin immunoprecipitation-seq, we measured the occupancy of Gcn4 and Cbf1 in synthetic promoter libraries composed of binding sites for Gcn4, Cbf1, Met31/Met32 and Nrg1. We measured the occupancy of these two TFs and the expression levels of all promoters in two growth conditions. Models parameterized using only expression data predicted expression but failed to identify several interactions between TFs. In contrast, models parameterized with occupancy and expression data predicted expression data, and also revealed Gcn4 self-cooperativity and a negative interaction between Gcn4 and Nrg1. Occupancy data also allowed us to distinguish between competing regulatory mechanisms for the factor Gcn4. Our framework for combining occupancy and expression data produces predictive models that better reflect the mechanisms underlying combinatorial cis-regulation of gene expression

    A cis-regulatory logic simulator

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A major goal of computational studies of gene regulation is to accurately predict the expression of genes based on the cis-regulatory content of their promoters. The development of computational methods to decode the interactions among cis-regulatory elements has been slow, in part, because it is difficult to know, without extensive experimental validation, whether a particular method identifies the correct cis-regulatory interactions that underlie a given set of expression data. There is an urgent need for test expression data in which the interactions among cis-regulatory sites that produce the data are known. The ability to rapidly generate such data sets would facilitate the development and comparison of computational methods that predict gene expression patterns from promoter sequence.</p> <p>Results</p> <p>We developed a gene expression simulator which generates expression data using user-defined interactions between cis-regulatory sites. The simulator can incorporate additive, cooperative, competitive, and synergistic interactions between regulatory elements. Constraints on the spacing, distance, and orientation of regulatory elements and their interactions may also be defined and Gaussian noise can be added to the expression values. The simulator allows for a data transformation that simulates the sigmoid shape of expression levels from real promoters. We found good agreement between sets of simulated promoters and predicted regulatory modules from real expression data. We present several data sets that may be useful for testing new methodologies for predicting gene expression from promoter sequence.</p> <p>Conclusion</p> <p>We developed a flexible gene expression simulator that rapidly generates large numbers of simulated promoters and their corresponding transcriptional output based on specified interactions between cis-regulatory sites. When appropriate rule sets are used, the data generated by our simulator faithfully reproduces experimentally derived data sets. We anticipate that using simulated gene expression data sets will facilitate the direct comparison of computational strategies to predict gene expression from promoter sequence. The source code is available online and as additional material. The test sets are available as additional material.</p

    Risk Analysis of Prostate Cancer in PRACTICAL, a Multinational Consortium, Using 25 Known Prostate Cancer Susceptibility Loci.

    Get PDF
    BACKGROUND: Genome-wide association studies have identified multiple genetic variants associated with prostate cancer risk which explain a substantial proportion of familial relative risk. These variants can be used to stratify individuals by their risk of prostate cancer. METHODS: We genotyped 25 prostate cancer susceptibility loci in 40,414 individuals and derived a polygenic risk score (PRS). We estimated empirical odds ratios (OR) for prostate cancer associated with different risk strata defined by PRS and derived age-specific absolute risks of developing prostate cancer by PRS stratum and family history. RESULTS: The prostate cancer risk for men in the top 1% of the PRS distribution was 30.6 (95% CI, 16.4-57.3) fold compared with men in the bottom 1%, and 4.2 (95% CI, 3.2-5.5) fold compared with the median risk. The absolute risk of prostate cancer by age of 85 years was 65.8% for a man with family history in the top 1% of the PRS distribution, compared with 3.7% for a man in the bottom 1%. The PRS was only weakly correlated with serum PSA level (correlation = 0.09). CONCLUSIONS: Risk profiling can identify men at substantially increased or reduced risk of prostate cancer. The effect size, measured by OR per unit PRS, was higher in men at younger ages and in men with family history of prostate cancer. Incorporating additional newly identified loci into a PRS should improve the predictive value of risk profiles. IMPACT: We demonstrate that the risk profiling based on SNPs can identify men at substantially increased or reduced risk that could have useful implications for targeted prevention and screening programs.D F. Easton was recipient of the CR-UK grant C1287/A10118. R A. Eeles was recipient of the CR-UK grant C5047/A10692 and B E. Henderson was recipient of the NIH grant 1U19CA148537-01This is the author accepted manuscript. The final version is available via AACR at http://cebp.aacrjournals.org/content/early/2015/04/02/1055-9965.EPI-14-0317.long
    • …
    corecore